Ref 

# 


Hits 


Search Query 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


LI- 


: ; 12 


(metasearch or (meta^adj search)) : 
and wrapper and (search adj 
engine) v 


;US-PGPUB; 
USPAT; : 
EPO; JPO; 
DERWENT; 
IIBMTDB ; : 


or ; : ; 


ON : •/ 


2005/07/18;07.:22. 


L2 


63 


wrapper same (search adj engine) 


US-PGPUB; 
USPAT; 

DERWENT; 

TDM THD 


OR 


ON 


2005/07/18 07:22 




. ; L :'.2i: : 


. wrappers! same : (search 1 adj: : 


■ US-PGPUB^ 


OR : : 


Mill 


■2005/07/18 07:27: 






engines!) 


USPAT; =: 
•EPO; JPO;;i : : ; 














DERWENT;!: 














IBM_TDB - 








L4 


0 


(wrapper or wrapp$4) same 
(componet adj object adj model) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:27 


L5 : :r;-.: 


• ■ : : 49^;: 


: (wrapper or wrapp$4):same T 
(component adj object adj: modeij v 


^US-PGPUB;!; 
USPAT;.;: 


OR ; : 


;ON 


f2005/07/18 07:35 








EPO; JPO; 
DERWENT; 

ibm_tdb ; ; ;= 
































L6 


2 


( (wrapper or wrapp$4) and 
(component adj object adj model)) 
same search$6 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:35 


L7 ] 




( (wrapper or wrapp$4);and ;.;. 
(component adj object adj model)) 
same (metasearch or (meta adj 
search) or search$6} 


US-PGPUB;: 
USPAT; ; ;: : V 
EPO; JPO; . 
DERWENT;. 
; IBM_TDB • 


OR 


ON 


: 2005/07/18:07:36 












L8 


446 


( (wrapper or wrapp$4) and 
(component adj object adj model)) 
and (metasearch or (meta adj 
search) or search$6) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:43 


L9; . 




:8and '7077$.CCl5. ; ; ^ 


US-PGPUB; 

uspat; :;;; 

EPO; JPO;: 
: . DERWENT;..: 
IBM_TbB.:.: :: ; 


OR; ■•.:•;;■!; 


. ON .; . 


: ;2605/O7/|8ip7:36 



Search History 7/18/05 7:45:05 AM Page 1 

C:\Documents and Settings\ALy\My Documents\EAST\Workspaces\default08182004.wsp 



L10 


30 


9 and (metasearch or (meta adj 
search) or (search$6 adj engine)) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM.TDB 


OR 


ON 


2005/07/18 07:37 


■,Lii;; : i 


27 


:9Van;di(m : etasea'fth:^ adj: CT; 
i'Sea^h^dr^seiiith^e adj!v 

: englnes0)ir--" : :;!;;i:;l.4 


.US-PGPUB; 

: USPAT;::P ;: : ; 
: EPO; JPO; ; \-. 
i DERWENT; - 
IBM.TDB "!: :• 






^2005/07/18 07:38 


L12 


1 


11 and (results! with (search$6 
adj engines!)) 


US-PGPUB; 
USPAT; 
EPO; JPO; 

UfcKWbN 1 , 
IBM_TDB 


OR 


ON 


2005/07/18 07:38 


143 ' ; 


■ 7 : 


11 and (results! sa.rrie (search$6 1 : ■ 
adj engines!)) 


;US-PGPUB; 
USPAT; 


•:|r' : 


:bi^; fl ; 


20Q5/07/^ 07:40 ; 








EPO; JPO; 
DERWENT; 
IBM_TDB : ■ 








L14 


4 


8 and (various or disparat$6 or 
different or vary) near9 (search 
adj engines!) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:43 


:Li5-1 


: : : 446 


: ( (wrapper or:Wrapp$4) and '- : 
(component adj object adj modei))/ 
and (search$6) : 


^US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT;: 
IBM_TDB!i; 


IK! III! 


!;ONV| 


: 2005/07/18 b7:44 :: 


Li6 


4 


15 and (various or disparat$6 or 
different or vary) near3 (search 
adj engines!) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:44 






:( :(wrap : per'br:Wrapp^4) : same %; 
(component adj object adj model)) : 
1 ahd:(search$6) hi 


US-PGPUB;" 
USPAT; : J 
EPb; JPO; i 
: DERWENT; 
IBM_TDB: : : 


bR '- : 


jpNj : ';- ; 


2005/df i8 ! 67:44 


lis 


1 


17 and (various or disparat$6 or 
different or vary) near3 (search 
adj engines!) 


US-PGPUB; 
USPAT; 
EPO; JPO; 
DERWENT; 
IBM_TDB 


OR 


ON 


2005/07/18 07:44 


L19 : 


i. . S ' ■ ll 


17 and (search adj engines!) ; 


USfjGpUB;! 
USPAT; . 
EPO; JPO; 
DERWENT; . 
IBM TDB 


or ;; ; 


: ON: J. 


2005/07/18:07:44:; 



Search History 7/18/05 7:45:05 AM Page 2 

C:\Documents and Settings\ALy\My Documents\EAST\Workspaces\default08182004.wsp 



Results (page 2): metasearch ^wrapper +component object model +search engine 



Page 1 of 6 




Subscribe (Full Service) Register (Limited Service, Free) Login 

Search: # The ACM Digital Library C The Guide 

[metasearch -^i^s^s^^ 



I Feedback Report a problem Satisfaction 
survey. 



Terms used 

metasearch wrapper component object model search engine 



Sort results r£j ™- ™ % Save .results to. a .Binder 

by i vgi 

nj , > 9*»™ ^ Search Tips 

D|s expanded form 11 r~ ~ T . 

results I r. ij Open results in a new 

window 



Results 21 - 40 of 200 
Best 200 shown 



Result page: previous 12 3 



Found 865 of 157,873 

Try an AdyancM.Sear^ 

Try this search in The ACM Guide 



4 5 6 7 8 9 10 next 
Relevance scale □ 



21 Search^ 

architectural approach 

Erik Boertjes, Willem Jonker, Jeroen Wijnands 

September 2001 Proceedings of the 2001 ACM workshops on Multimedia: multimedia 
information retrieval 

Full text available: ^.pdf(599,.94 KB) Additional Information: ML citation, abstract, references, index terms 

This paper presents a scalable and extendable architecture consisting of the essential 
building blocks for multimedia information services. It provides building blocks for 
multimedia transport, storage, retrieval, filtering, and presentation, together with their 
interdependencies. After presenting the overall architecture, we focus in more detail on the 
3-level modeling and querying of multimedia data. Emphasis is placed on the support for a 
wide variety of modeling and querying techniques in th ... 

Keywords: information management, metadata management, multimedia search, 
multimedia services, platform architectures, query processing 



22 Editorial:..speciaL | 
Bing Liu, Kevin Chen-Chuan-Chang 

December 2004 ACM SIGKDD Explorations Newsletter, Volume 6 issue 2 

Full text available: pdf?178.32 KB) Additional Information: full citation , abstract , references 

With the phenomenal growth of the Web, there is an everincreasing volume of data and 
information published in numerous Web pages. The research in Web mining aims to develop 
new techniques to effectively extract and mine useful knowledge or information from these 
Web pages [8], Due to the heterogeneity and lack of structure of Web data, automated 
discovery of targeted or unexpected knowledge/information is a challenging task. It calls for 
novel methods that draw from a wide range of fields spanni ... 

23 A mediation infrastructure for digital library services | 
Sergey Melnik, Hector Garcia-Molina, Andreas Paepcke 

June 2000 Proceedings of the fifth ACM conference on Digital libraries 

Full text available- US od«155.30 KB) Additional ,nformation: M.cMiM, abstract, references, citings, index 

terms 

Digital library mediators allow interoperation between diverse information services. In this 
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paper we describe a flexible and dynamic mediator infrastructure that allows mediators to 
be composed from a set of modules (* x blades"). Each module implements a particular 
mediation function, such as protocol translation, query translation, or result merging. All 
the information used by the mediator, including the mediator logic itself, is represented by 
an RDF graph. We i ... 

Keywords: component design, interoperability, mediator, wrapper 



24 Performance and cost tradeoffs in Web search H 
Nick Craswell, Francis Crimmins, David Hawking, Alistair Moffat 

January 2004 Proceedings of the fifteenth conference on Australasian database - 
Volume 27 

Full text available: ^.pdfil53.92 KBl Additional Information: fujj.citatipn J abstract, references 

Web search engines crawl the web to fetch the data that they index. In this paper we re- 
examine that need, and evaluate the network costs associated with data acquisition, and 
alternative ways in which a search service might be supported. As a concrete example, we 
make use of the Research Finder search service provided at http://rf.panopticsearch.com, 
and information derived from its crawl and query logs. Based upon an analysis of the 
Research Finder system we introduce a hybrid arrangement, in ... 

Keywords: Web crawling, World-Wide Web, information retrieval, metasearch, search 
engine 



25 Tools and approaches for developing data-intensive Web appitcatlons: a survey 
Piero Fraternali 

September 1999 ACM Computing Surveys (CSUR), Volume 31 issue 3 

Additional Information: Mlcitatjpn, abstract, references citinas, index 



Full text available: Wpdf(524.80 KB) 

^ terms 

The exponential growth and capillar diffusion of the Web are nurturing a novel generation of 
applications, characterized by a direct business-to-customer relationship. The development 
of such applications is a hybrid between traditional IS development and Hypermedia 
authoring, and challenges the existing tools and approaches for software production. This 
paper investigates the current situation of Web development tools, both in the commercial 
and research fields, by identifying and characte ... 



Keywords: HTML, Intranet, WWW, application, development 



26 Session.M 

shopping assistant 

Filippo Menczer, W. Nick Street, Narayan Vishwakarma, Alvaro E. Monge, Markus Jakobsson 
July 2002 Proceedings of the first international joint conference on Autonomous 
agents and multiagent systems: part 3 

Full text available: ^pdf(M3,25„KB) Additional Information: Ml.citatjon, abstract, references, indexterms 

The IntelliShopper is a shopping assistant designed to empower consumers. It is a personal 
assistant in that it observes the users while shopping and learns their preferences with 
respect to various features that characterize shopping items. It is proactive in that it 
remembers the users' requests and autonomously monitors vendor sites for new items that 
might match the users' needs and preferences. Finally, it protects users' privacy by means 
of pseudonymity, IP anony\-mizing, and trusted filte ... 

Keywords: learning, monitoring, personalization, privacy, pro-activity, shopping 
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27 Web-based specification and integration of iegacv services 
Ying Zou, Kostas Kontogiannis 

November 2000 Proceedings of the 2000 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^pdf(279.28„KB5 Additional Information: fulj. cjtation, abstract, references, index terms 

With the explosive growth of the Internet, businesses of all sizes aim on applying 
networkwide solutions to their IT infrastructures, migrating their legacy business processes 
into web-based environments, and establishing their own on-line services. To facilitate 
process and service integration, a complete and information rich service description 
language, is essential for server processes to be specified and for client processes to be 
able to locate services that are available in Web-enabled re ... 

28 Web site engineering: Enforcing strict model-view separation in template engines 
Terence John Parr 

May 2004 Proceedings of the 13th international conference on World Wide Web 

Full text available: ^.pdKil 8,03 KB). Additional Information: fujj.cjtatipn, abstract, references, index terms 

The mantra of every experienced web application developer is the same: thou shalt 
separate business logic from display. Ironically, almost all template engines allow violation 
of this separation principle, which is the very impetus for HTML template engine 
development. This situation is due mostly to a lack of formal definition of separation and 
fear that enforcing separation emasculates a template's power. I show that not only is strict 
separation a worthy design principle, but that we c ... 

Keywords: model-view-controller, template engine, web application 



29 Locating^ 

George A. Mihaila, Louiqa Raschid, Anthony Tomasic 

August 2002 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 11 Issue 1 

Full text available: ^.pdft13Q,81 KB) Additional Information: fuNcjtalion, abstract, dtincjs, .index terms 

Many collections of scientific data in particular disciplines are available today on the World 
Wide Web. Most of these data sources are compliant with some standard for interoperable 
access. In addition, sources may support a common semantics, i.e., a shared meaning for 
the data types and their domains. However, sharing data among a global community of 
users is still difficult because of the following reasons: (i) data providers need a mechanism 
for describing and publishing available sources of ... 

Keywords: Data discovery, Data integration, Mediators, Query languages, World Wide 
Web, XML 



30 Dlaitai„jjbranes„in 

the LEBONED approach 

Frank Oldenettel, Michael Malachinski, Dennis Reil 

May 2003 Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries 

Full text available: .pdf(299 v 24 KB) Additional Information: ML citation, abstract, references, jndex terms 

This paper presents the project LEBONED that focuses on the integration of digital libraries 
and their contents into web-based learning environments. We describe in general how the 
architecture of a standard learning management system has to be modified to enable the 
integration of digital libraries. An important part of this modification is the LEBONED 
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Metadata Architecture which depicts the handling of metadata and documents imported 
from digital libraries. The main components of this architec ... 

31 Database techniques for the World-Wide Web: a survey 
Daniela Florescu, Alon Levy, Alberto Mendelzon 
September 1998 ACM SIGMOD Record, volume 27 issue 3 

Full text available: ffl.pdfi.1 A 7.9.JMB.}. Additional Information: fyH citation, citinas, index. terms 



32 Streams, structures, spaces, scenarios, societies (5s): A formal model for digital 
libraries 

Marcos Andre Gongalves, Edward A. Fox, Layne T. Watson, Neill A. Kipp 

April 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 2 

Additional Information: full citation , abstract , references, citings , index 



Full text available: « Ppdff316.65 KB) 

terms, review 

Digital libraries (DLs) are complex information systems and therefore demand formal 
foundations lest development efforts diverge and interoperability suffers. In this article, we 
propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and 
Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams are 
sequences of arbitrary items used to describe both static and dynamic (e.g., video) content. 
Structures can be viewed as labeled directed gra ... 



Keywords: applications., definitions, foundations, taxonomy 



33 Workshop..^ B 

May 1998 ACM SIGSOFT Software Engineering Notes, volume 23 issue 3 
Full text available: ®.pdf{2 J1..MB). Additional Information: MLcjtatiQD., indexjenrns 



34 DocumenLM B 
Nkechi Nnadi, Michael Bieber 

October 2004 Proceedings of the 2004 ACM symposium on Document engineering 

Full text available: pdf(314.56 KB) Additional Information: full citation , abstract, references , index terms 

This research's primary contribution is providing a relatively straightforward sustainable 
infrastructure for integrating documents and services. Users see a totally integrated 
environment. The integration infrastructure generates supplemental link anchors. Selecting 
one generates a list of relevant links automatically through the use of relationship rules. 



Keywords: automatic link generation, metainformation, relationship rules, service 
integration 



35 Computing curricula 2001 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Full text available: f*tpdf(61 3.53 KB) AJJ . A . ll£ , ,, .. _ ... . , . 

TV 7.T -'v;rv;V" " Additional Information: full ciiation, references, crt*nas, index terms 
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36 Template-based wrappers in the TS1MM1S system | 
Joachim Hammer, Hector Garda-Molina, Svetlozar Nestorov, Ramana Yemeni, Marcus Breunig, 
Vasilis Vassalos 

June 1997 ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 

conference on Management of data, volume 26 issue 2 

«— hi 1 .. ra , WC77H ^ D . Additional Information: full citation, abstract, references, citings, index 

Full text available: f&a pdf(577.12 KB 1 ' 1 ^ 

J Terms 

In order to access information from a variety of heterogeneous information sources, one 
has to be able to translate queries and data from one data model into another. This 
functionality is provided by so-called (source) wrappers [4,8] which convert queries into 
one or more commands/queries understandable by the underlying source and transform the 
native results into a format understood by the application. As part of the TSIMMIS project 
[1, 6] we have developed hard-coded wr ... 

37 A v|suaj. appro^ 

Isabel F. Cruz, Wendy T. Lucas 

November 1997 Proceedings of the fifth ACM international conference on Multimedia 

Full text available: f^ pdfM.54 MB) Additional Information: full citation , references, citings , index terms 



38 The HvperDisco approach to open hypermedia systems 
Uffe Kock Will, John J. Leggett 

March 1996 Proceedings of the the seventh ACM conference on Hypertext 

Full text available: ffifidftlQJJylBl Additional Information: fullcitation, citings, index terms 



Keywords: collaborative work, computation, data models, distribution, extensibility, 
heterogeneity, hyperbase management systems, hypermedia platforms, integration, inter- 
tool linking, interoperability, link services, open hypermedia systems, openness, scalability, 
system architectures 



39 Metadata for digital libraries: architecture and design rationale 

Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, Andreas Paepcke 

July 1997 Proceedings of the second ACM international conference on Digital libraries 

Full text available: ffiMfi.t65_MBj. Additional Information: MlQltatjpn., references, citings, indextenris 



Keywords: CORBA, InfoBus, attrabute model translation, attribute model translation, 
digital libraries, heterogeneity, interoperability, metadata architecture, metadata repository, 
proxy architecture 



40 Coord^ 

Koichi TERAI, Noriaki IZUMI, Takahira YAMAGUCHI 

September 2003 Proceedings of the 5th international conference on Electronic 
commerce 

Full text available: |j | pdf(399.11 KB) Additional Information: full citation , abstract , references 

Coordinating Web Services dynamically revolutionizes how each business collaborates for its 
performance improvement, so that many IT researchers are recently focusing on them. In 
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order to make the coordination really effective in business, we have to take viewpoints from 
business models. Though several specifications and approaches were proposed for the 
coordination, there are no clear relationships between them and business models yet. In 
this paper, we propose a framework for Web Services coor ... 

Keywords: business application development, business model management, reuse of 
business model, web services coordination 
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DMa.extractjon^ 

Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, Clement Yu 

May 2005 Proceedings of the 14th international conference on World Wide Web 

Full text available: ^ fxif(3l5.59 KB) Additional Information: full citation , abstract, references, index terms 

When a query is submitted to a search engine, the search engine returns a dynamically 
generated result page containing the result records, each of which usually consists of a link 
to and/or snippet of a retrieved Web page. In addition, such a result page often also 
contains information irrelevant to the query, such as information related to the hosting site 
of the search engine and advertisements. In this paper, we present a technique for 
automatically producing wrappers that can be used to extr ... 



Keywords: information extraction, search engine, wrapper generation 



2 SPUR * STARTS = S DARTS a protocol and toolkit for metasearching B 
Noah Green, Panagiotis G. Ipeirotis, Luis Gravano 

January 2001 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries 

Full text available: f§ pdftSOl .52 KB) Additional Information: M.cjtatipn, abstract, references, citings, index 
' *" ' terms 

In this paper we describe how we combined SDLIP and STARTS, two comple mentary 
protocols for searching over distributed document collections. The resulting protocol, which 
we call SDARTS, is simple yet expressible enough to enable building sophisticated 
metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch- 
specific elements from STARTS. We also report on our experience building three SDARTS- 
compliant wrappers: for locally available plain-text document collect ... 

3 OA! application: Extending SDARTS: extracting metadata from web databases and 
interfacing, with.^ 

Panagiotis G. Ipeirotis, Tom Barry, Luis Gravano 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 

Full text available: ^ sdf(303.33 KB) Additional Information: full citation, abstract , references, index terms 

SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines 
two complementary existing protocols, SDLIP and STARTS, to define a uniform interface 
that collections should support for searching and exporting metasearch-related metadata. 
SDARTS also includes a toolkit with wrappers that are easily customized to make both local 
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and remote document collections SDARTS-compliant. This paper describes two significant 
ways in which we have extended the SDARTS toolkit. First, we ... 

Keywords: SDLIP, distributed searching, metadata, metasearching, web databases, 
wrapper construction 



4 Towards cg^^ 

David B. Leake, Ryan Scherle 

January 2001 Proceedings of the 6th international conference on Intelligent user 
interfaces 

Full text available: ftpdff155.22 KB) Additional Information: fujl citation., abstract, references, citings, jndex 

term;; 

A well-known problem for web search is targeting search on information that satisfies users' 
information needs. User queries tend to be short, and hence often ambiguous, which can 
lead to inappropriate results from general-purpose search engines. This has led to a 
number of methods for narrowing queries by adding information. This paper presents an 
alternative approach that aims to improve query results by using knowledge of a user's 
current activities to select search engines relevant t ... 

Keywords: distributed information systems, intelligent web search, just-in-time 
information access 



A query based approach for integrating heterogeneous data sources 
Ruxandra Domenig, Klaus R. Dittrich 

November 2000 Proceedings of the ninth international conference on Information and 
knowledge management 

Full text available: ^pd.g21 3, J.5.KBj Additional Information: Mixtion, references, index terms 
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knowledge management 

Full text available: f|j pdf(340.59 KB) Additional Information: full citation , references, citings , index terms 



Keywords: Bayesian fusion approaches, text extraction, web search 
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Nick Craswell, Francis Crimmins, David Hawking, Alistair Moffat 

January 2004 Proceedings of the fifteenth conference on Australasian database 
Volume 27 

Full text available: ^pdfll 53.92 KB) Additional Information: full citation, abstract , references 

Web search engines crawl the web to fetch the data that they index. In this paper we re- 
examine that need, and evaluate the network costs associated with data acquisition, and 
alternative ways in which a search service might be supported. As a concrete example, we 
make use of the Research Finder search service provided at http://rf.panopticsearch.com, 
and information derived from its crawl and query logs. Based upon an analysis of the 
Research Finder system we introduce a hybrid arrangement, in ... 
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Keywords: Web crawling, World-Wide Web, information retrieval, metasearch, search 
engine 



Posters: Testbed for information extraction from deep web 
Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, Sachio Hirokawa 
May 2004 Proceedings of the 13th international World Wide Web conference on 
Alternate track papers & posters 

Full text available; ^pdf(24 ,74 KB.}. Additional Information: ML.Qjtati.on, abstract, references, indexterms 

Search results generated by searchable databases are served dynamically and far larger 
than the static documents on the Web. These results pages have been referred to as the 
Deep Web. We need to extract the target data in results pages to integrate them on 
different searchable databases. We propose a test bed for information extraction from 
search results. We chose 100 databases randomly from 114,540 pages with search forms. 
Therefore, these databases have a good variety. We selected 51 database ... 

Keywords: deep web, meta search, testbed, wrapper 
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Matthew Montebello 

August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ^ odf(296,25. KB) Additional Information: Mlcitatjpn., references, citings, indexlerms 



10 An overview and classification of mediated query systems 
Ruxandra Domenig, Klaus R. Dittrich 

September 1999 ACM SIGMOD Record, Volume 28 issue 3 

Full text available: ^pdgS97,„64 KB) Additional Information: fujlcitatjpn, abstract, citings, indexjejrns 

Multimedia technology, global information infrastructures and other developments allow 
users to access more and more information sources of various types. However, the 
"technical" availability alone (by means of networks, WWW, mail systems, databases, etc.) 
is not sufficient for making meaningful and advanced use of all information available on- 
line. Therefore, the problem of effectively and efficiently accessing and querying 
heterogeneous and distributed data sources is an impo ... 

11 Session 9A: applications in commerce: IntelliShopper: a proactive, personal private 
shopping assistant 

Filippo Menczer, W. Nick Street, Narayan Vishwakarma, Alvaro E. Monge, Markus Jakobsson 
July 2002 Proceedings of the first international joint conference on Autonomous 
agents and multiagent systems: part 3 

Full text available: ' gj pdf(383.25 KB) Additional Information: full citation, abstract , references , index terms 

The IntelliShopper is a shopping assistant designed to empower consumers. It is a personal 
assistant in that it observes the users while shopping and learns their preferences with 
respect to various features that characterize shopping items. It is proactive in that it 
remembers the users' requests and autonomously monitors vendor sites for new items that 
might match the users' needs and preferences. Finally, it protects users' privacy by means 
of pseudonymity, IP anony\-mizing, and trusted filte ... 

Keywords: learning, monitoring, personalization, privacy, pro-activity, shopping 



http://portal.acm.org/resultsxfa 7/18/05 



Results (page 1): +metasearch +wrapper +search engine 



Page 4 of 6 



12 Programming.]^ 

Mathias Bauer, Dietmar Dengler, Gabriele Paul, Markus Meyer 
March 2000 Communications of the ACM, volume 43 issue 3 
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13 FERSiVAL a system for personated search and summarization over multimedia 
healthcare information 

Kathleen R. McKeown, Shih-Fu Chang, James Cimino, Steven Feiner, Carol Friedman, Luis 
Gravano, Vasileios Hatzivassiloglou, Steven Johnson, Desmond A. Jordan, Judith L Klavans, 
Andre Kushniruk, Vimla Patel, Simone Teufel 

January 2001 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries 

Additional Information: full citation, abstract , references , citings , index 



Full text available: TO odf(369.13 KB) 

^ ' teEIDs 

In healthcare settings, patients need access to online information tha t can help them 
understand their medical situation. Physicians need information that is clinically relevant to 
an individual patient. In this paper, we present our progress on developing a system, 
PERSIVAL, that is designed to provide personalized access to a distributed patient care 
digital library. Using the secure, online patient records at New York Presbyterian Hospital as 
a user model, PERSIVAL's components tailors ... 

Keywords: medical digital library, multimedia, natural language, personalization, query 
interface, search, summarization 



Luis Gravano, Panagiotis G. Ipeirotis, Mehran Sahami 

January 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 1 

•— . 1 , a ui • m > *o\ Additional Information: full citation, abstract , references, citings , index 

Full text available: fU pdfi '3.62 MB1 

^ terms 

The contents of many valuable Web-accessible databases are only available through search 
interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web 
sites have started to manually organize Web-accessible databases into YahooHike 
hierarchical classification schemes. Here we introduce QProber, a modular system that 
automates this classification process by using a small number of query probes, generated 
by document classifiers. QProber can use a variety of types of ... 

Keywords; Database classification, Web databases, hidden Web 



15 internet data management (IDM): Learning query languages of Web interfaces Q 
Andre Bergholz, Boris Chidlovskii 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^.pdg253.16„KB.l Additional Information: fuHcMion, abstract, references 

This paper studies the problem of automatic acquisition of the query languages supported 
by a Web information resource. We describe a system that automatically probes the search 
interface of a resource with a set of test queries and analyses the returned pages to 
recognize supported query operators. The automatic acquisition assumes the availability of 
the number of matches the resource returns for a submitted query. The match numbers are 
used to train a learning system and to generate classific ... 



http://portal.acm.org/resd 7/18/05 



Results (page 1): +metasearch -i-wrapper +search engine 



Page 5 of 6 



Keywords: hidden Web, learning, query operators, search interface 



16 Probe, count, and classify: categorizing hidden web databases Q 
Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami 

May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 
conference on Management of data, volume 30 issue 2 

Full text available- *W \ pdf(3S9.34 KB) Additional ,nformatjon: MJSMHML abstract, references, citings, index 
^ terms 

The contents of many valuable web-accessible databases are only accessible through search 
interfaces and are hence invisible to traditional web "crawlers." Recent studies have 
estimated the size of this "hidden web" to be 500 billion pages, while the size of the 
"crawlable" web is only an estimated two billion pages. Recently, commercial web sites 
have started to manually organize web-accessible databases into Yahoo l-like hierarchical 
classification schemes ... 
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David Buttler, Ling Liu, Calton Pu, Henrique Paques, Wei Han, Wei Tang 
May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 

conference on Management of data, volume 39 issue 2 
Full text available: ^ .pdf{38.25.KBj Additional Information: MLcitaiion, index. tenss 



18 Concept-based querying in mediator systems 
Kai-Uwe Sattler, Ingolf Geist, Eike Schallehn 

March 2005 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 14 Issue 1 

Full text available: pdf(329.34 KB) Additional Information: full citation , abstract 

One approach to overcoming heterogeneity as a part of data integration in mediator 
systems is the use of metadata in the form of a vocabulary or ontology to represent domain 
knowledge explicitly. This requires including this meta level during query formulation and 
processing. In this paper, we address this problem in the context of a mediator that uses a 
concept-based integration model and an extension of the XQuery language called CQuery. 
This mediator has been developed as part of a project fo ... 

Keywords: Data integration, Mediator systems, Query processing 



19 Predicate rewriting for translating Boolean queries in a heterogeneous information 
system 

Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke 

January 1999 ACM Transactions on Information Systems (TOIS), volume 17 issue l 
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Searching over heterogeneous information sources is difficult in part because of the 
nonuniform query languages. Our approach is to allow users to compose Boolean queries in 
one rich front-end language. For each user query and target source, we transform the user 
query into a subsuming query that can be supported by the source but that may return 
extra documents. The results are then processed by a filter query to yield the correct final 
results. In this article we introduce the architectur ... 

Keywords: Boolean queries, content-based retrieval, filtering, predicate rewriting, query 
subsumption, query translation 
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20 Semantic caching of Web queries 
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Boris Chidlovskii, Uwe M. Borghoff 

March 2000 The VLDB Journal — The International Journal on Very Large Data Bases, 



Full text available: 'g j pdf(235.09 KB) Additional Information: full citation, abstra ct . citings, index terms 

In meta-searchers accessing distributed Web-based information repositories, performance 
is a major issue. Efficient query processing requires an appropriate caching mechanism. 
Unfortunately, standard page-based as well as tuple-based caching mechanisms designed 
for conventional databases are not efficient on the Web, where keyword-based querying is 
often the only way to retrieve data. In this work, we study the problem of semantic caching 
of Web queries and develop a caching mechanism for conjun ... 

Keywords: Experiments, Query algorithms, Region containment, Semantic caching, 
Signature files 
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21 EditM^^ 

Bing Liu, Kevin Chen-Chuan-Chang 

December 2004 ACM SIGKDD Explorations Newsletter, Volume 6 issue 2 

Full text available: ^ pdf? 178.32 KB? Additional Information: full citation, abstract, references 



With the phenomenal growth of the Web, there is an everincreasing volume of data and 
information published in numerous Web pages. The research in Web mining aims to develop 
new techniques to effectively extract and mine useful knowledge or information from these 
Web pages [8], Due to the heterogeneity and lack of structure of Web data, automated 
discovery of targeted or unexpected knowledge/information is a challenging task. It calls for 
novel methods that draw from a wide range of fields spanni ... 

22 XLjbris:. an. automated ^ I 
Andrew Crossen, Jay Budzik, Mason Warner, Larry Birnbaum, Kristian J. Hammond 
January 2001 Proceedings of the 6th international conference on Intelligent user 
interfaces 

Additional Information: full citation, abstract , references, citings, index 



Full text available: 
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While recent work has focused on providing tools and infrastructure for users to access 
electronic information over the Internet, the relationship between the physical world and 
information available online has been relatively unexplored. Information about a user's 
location, and the objects she interacts with, can be sufficient to recognize enough of the 
user's task to drive retrieval of online information relevant to the task at hand. The XLibris 
system automatically retrieves, aggregates ... 

Keywords: automated retrieval, information aggregation, metasearch, ubiquitous 
computing 



23 Mediators over taxonomy-based information sources 
Yannis Tzitzikas, Nicolas Spyratos, Panos Constantopoulos 

March 2005 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 14 Issue 1 

Full text available: ^ odf(428.09 KB) Additional Information: full citation, abstract 

We propose a mediator model for providing integrated and unified access to multiple 
taxonomy-based sources. Each source comprises a taxonomy and a database that indexes 
objects under the terms of the taxonomy. A mediator comprises a taxonomy and a set of 
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relations between the mediator's and the sources' terms, called articulations. By combining 
different modes of query evaluation at the sources and the mediator and different types of 
query translation, a flexible, efficient scheme ... 

Keywords: Approximate query translation, Information integration, Mediators, Taxonomies 



24 Document adaptation: Lightweight integration of documents and services 
Nkechi Nnadi, Michael Bieber 

October 2004 Proceedings of the 2004 ACM symposium on Document engineering 



Full text available: TO .pd£31„4 ,68 .KB} Additional Information: fuJJ.citation J abstract, references, jndex terms 




This research's primary contribution is providing a relatively straightforward sustainable 
infrastructure for integrating documents and services. Users see a totally integrated 
environment. The integration infrastructure generates supplemental link anchors. Selecting 
one generates a list of relevant links automatically through the use of relationship rules. 



Keywords: automatic link generation, metainformation, relationship rules, service 
integration 



Results 21 - 24 of 24 Result page: previous 12 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc. 

Terms of Usage Privacy Policy Code of Ethics Contact Us 




Useful downloads: WMoM Acrobat 




http://portal.acm.org/resultsxfa 7/1 8/05 



US Patent and Trademark Office for EIC - Advanced Search (INZZ) 



mm 




Advanced Search: inspec - 1969 to date (inzz) 



Search history: 



No. 


Database 


Search term 


Info added 
since 


Results 




1 


INZZ 


metasearch ADJ wrapper ADJ search 
ADJ engine 


unrestricted 


0 




2 


INZZ 


metasearch SAME wrapper SAME 
search ADJ engine 


unrestricted 


0 




3 


INZZ 


metasearch AND wrapper AND search 
ADJ engine AND component ADJ object 
ADJ model 


unrestricted 


0 





.hide | dejete aji search. steps........ | dejete jndividuaL 

Enter your search term(s): Search tips 



whole document 



Information added since: j~ 
(YYYYMMDD) 



I or: jnone 



Select special search terms from the following list(s): 



Publication year 




Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


A: 


Classification 


codes 


B: 


Classification 


codes 


B: 


Classification 


codes 


C: 


Classification 


codes 


D: 



http://www.datastarweb.com/USPTOEIC/20050718_13 1433_8d09b_28/WBFORM/3/b8441 ... 7/18/05 



US Patent and Trademark Office for EIC - Advanced Search (INZZ) 



Page 2 of 2 



| Classification codes E: Manufacturing & Production 

| Treatment codes 

HNSPEC sub-file 

| Language of publication 

| Publication types 



Top - News & FAQS - Dialog 
© 2005 Dialog 



http://www.datastarwebxom/U^ 7/18/05 



Results (page 1): metasearch +wrapper +com 



Page 1 of 6 




USPTO 



Subscribe (Full Service) Register (Limited Service, Free) Login 

Search: The ACM Digital Library C The Guide 

Jmetasearch i +wrapper +co |^^^^ 



Terms used metasearch wrapper com 



Sort results 
by 



relevance 



3 ^ Save results to a Binder 



***** 

Display l 77; sss* ^^arch.Xjps 

p ' y expanded form mm r - ^ » • 

results j v. i.j Open results in a new 



Feedback Report a problem Satisfaction 
survey. 

Found 1,507 of 157,873 

Try an Advanced Search 

Try this search in The. ACM. Guide. 



Results 1 - 20 of 200 
Best 200 shown 



window 
Result page: 1 2 3 4 



6 7 8 9 10 



next 

Relevance scale UUiil 



1 DMQ.extraction;„Fu 

Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, Clement Yu 

May 2005 Proceedings of the 14th international conference on World Wide Web 

Full text available: a || ] pdff315.59 KB) Additional Information: full citation , abstract, references , index terms 

When a query is submitted to a search engine, the search engine returns a dynamically 
generated result page containing the result records, each of which usually consists of a link 
to and/or snippet of a retrieved Web page. In addition, such a result page often also 
contains information irrelevant to the query, such as information related to the hosting site 
of the search engine and advertisements. In this paper, we present a technique for 
automatically producing wrappers that can be used to extr ... 



Keywords: information extraction, search engine, wrapper generation 



SPUR * STARTS = S DARTS a protocol and tooikitfor metasearohing 
Noah Green, Panagiotis G. Ipeirotis, Luis Gravano 

January 2001 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries 

Full text available: «J od«30152 KB) Additional lnformation: Motion, abstract, references, citings, .index 
^ terms 

In this paper we describe how we combined SDLIP and STARTS, two comple mentary 
protocols for searching over distributed document collections. The resulting protocol, which 
we call SDARTS, is simple yet expressible enough to enable building sophisticated 
metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch- 
specific elements from STARTS. We also report on our experience building three SDARTS- 
compliant wrappers: for locally available plain-text document collect ... 

Session 9A: applications in commerce: inteiHShopper: a proactive, personal, private 
shopping.assMant 

Filippo Menczer, W. Nick Street, Narayan Vishwakarma, Alvaro E. Monge, Markus Jakobsson 
July 2002 Proceedings of the first international joint conference on Autonomous 
agents and multiagent systems: part 3 

Full text available: | |] »dft363.25 KB) Additional Information: full citation, abstract , references , index terms 

The IntelliShopper is a shopping assistant designed to empower consumers. It is a personal 
assistant in that it observes the users while shopping and learns their preferences with 
respect to various features that characterize shopping items. It is proactive in that it 
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remembers the users' requests and autonomously monitors vendor sites for new items that 
might match the users' needs and preferences. Finally, it protects users' privacy by means 
of pseudonymity, IP anony\-mizing, and trusted filte ... 

Keywords: learning, monitoring, personalization, privacy, pro-activity, shopping 

4 An.oye^iew.an Q 
Ruxandra Domenig, Klaus R. Dittrich 

September 1999 ACM SIGMOD Record, volume 28 issue 3 

Full text available: pdf(897.64 KB) Additional Information: full citation , abstract, citings , index terms 

Multimedia technology, global information infrastructures and other developments allow 
users to access more and more information sources of various types. However, the 
"technical" availability alone (by means of networks, WWW, mail systems, databases, etc.) 
is not sufficient for making meaningful and advanced use of all information available on- 
line. Therefore/the problem of effectively and efficiently accessing and querying 
heterogeneous and distributed data sources is an impo ... 
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Kathleen R. McKeown, Shih-Fu Chang, James Cimino, Steven Feiner, Carol Friedman, Luis 
Gravano, Vasileios Hatzivassiloglou, Steven Johnson, Desmond A. Jordan, Judith L. Klavans, 
Andre Kushniruk, Vimla Patel, SimoneTeufel 

January 2001 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries 

Full text available: f § pdff369.13 KB: Additional Information: full citation, abstract, references, citings, index 



arms 

In healthcare settings, patients need access to online information tha t can help them 
understand their medical situation. Physicians need information that is clinically relevant to 
an individual patient. In this paper, we present our progress on developing a system, 
PERSIVAL, that is designed to provide personalized access to a distributed patient care 
digital library. Using the secure, online patient records at New York Presbyterian Hospital as 
a user model, PERSIVAL's components tailor s ... 

Keywords: medical digital library, multimedia, natural language, personalization, query 
interface, search, summarization 
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Ruxandra Domenig, Klaus R. Dittrich 

November 2000 Proceedings of the ninth international conference on Information and 
knowledge management 

Full text available: W\ pdf(2i3.15 KB) Additional Information: full citation, references, index terms 
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January 2004 Proceedings of the fifteenth conference on Australasian database 
Volume 27 

Full text available: ^pdf(J.53J2.KB) Additional Information: .full .citation, .abstract, references 

Web search engines crawl the web to fetch the data that they index. In this paper we re- 
examine that need, and evaluate the network costs associated with data acquisition, and 
alternative ways in which a search service might be supported. As a concrete example, we 
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make use of the Research Finder search service provided at http://rf.panopticsearch.com, 
and information derived from its crawl and query logs. Based upon an analysis of the 
Research Finder system we introduce a hybrid arrangement, in ... 

Keywords: Web crawling, World-Wide Web, information retrieval, metasearch, search 
engine 



8 Posters: Testbed for information extraction from deep web 
Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, Sachio Hirokawa 

May 2004 Proceedings of the 13th international World Wide Web conference on 
Alternate track papers & posters 

Fu 1 1 text a va i I a b I e : *^.pdf(24,.74.„KBj Ad d itio n a 1 1 nfo rmati o n : fuJJ.c itat j o n, a b stra ct , refe ren ces , j n d ex te rms 

Search results generated by searchable databases are served dynamically and far larger 
than the static documents on the Web. These results pages have been referred to as the 
Deep Web. We need to extract the target data in results pages to integrate them on 
different searchable databases. We propose a test bed for information extraction from 
search results. We chose 100 databases randomly from 114,540 pages with search forms. 
Therefore, these databases have a good variety. We selected 51 database ... 

Keywords: deep web, meta search, testbed, wrapper 

9 Mediators oyer taxonomx-ba Q 
Yannis Tzitzikas, Nicolas Spyratos, Panos Constantopoulos 

March 2005 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 14 Issue 1 

Full text available: ^.pdf(428,„09 KB) Additional Information: fyjLcjtaiipn, abstract 

We propose a mediator model for providing integrated and unified access to multiple 
taxonomy-based sources. Each source comprises a taxonomy and a database that indexes 
objects under the terms of the taxonomy. A mediator comprises a taxonomy and a set of 
relations between the mediator's and the sources' terms, called articulations. By combining 
different modes of query evaluation at the sources and the mediator and different types of 
query translation, a flexible, efficient scheme ... 

Keywords: Approximate query translation, Information integration, Mediators, Taxonomies 



10 internet data management (IDM): Learning query languages of Web interfaces 
Andre Bergholz, Boris Chidlovskii 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^.pdg253.1.8 KB) Additional Information: fgJJ, citation, abstract, references 

This paper studies the problem of automatic acquisition of the query languages supported 
by a Web information resource. We describe a system that automatically probes the search 
interface of a resource with a set of test queries and analyses the returned pages to 
recognize supported query operators. The automatic acquisition assumes the availability of 
the number of matches the resource returns for a submitted query. The match numbers are 
used to train a learning system and to generate classific ... 

Keywords: hidden Web, learning, query operators, search interface 

11 Predicate rewriting for translating Boolean queries in a heterogeneous information 



http://portal.acm.org/res^ 7/18/05 



Results (page 1): metasearch +wrapper +com 



Page 4 of 6 



system 

Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke 

January 1999 ACM Transactions on Information Systems (TOIS), Volume 17 issue l 

Additional Information: full citation, abstract, references, citings, index 



Full text available: 

terms 

Searching over heterogeneous information sources is difficult in part because of the 
nonuniform query languages. Our approach is to allow users to compose Boolean queries in 
one rich front-end language. For each user query and target source, we transform the user 
query into a subsuming query that can be supported by the source but that may return 
extra documents. The results are then processed by a filter query to yield the correct final 
results. In this article we introduce the architectur ... 

Keywords: Boolean queries, content-based retrieval, filtering, predicate rewriting, query 
subsumption, query translation 
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Keywords: Bayesian fusion approaches, text extraction, web search 
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March 2000 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 9 Issue 1 

Full text available: ^pdf(235,09.KBj Additional Information: MLQltatjon, abstract, citings, index.terms 

In meta-searchers accessing distributed Web-based information repositories, performance 
is a major issue. Efficient query processing requires an appropriate caching mechanism. 
Unfortunately, standard page-based as well as tuple-based caching mechanisms designed 
for conventional databases are not efficient on the Web, where keyword-based querying is 
often the only way to retrieve data. In this work, we study the problem of semantic caching 
of Web queries and develop a caching mechanism for conjun ... 

Keywords: Experiments, Query algorithms, Region containment, Semantic caching, 
Signature files 
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August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
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Nkechi Nnadi, Michael Bieber 

October 2004 Proceedings of the 2004 ACM symposium on Document engineering 

Full text available: ^-pdff-314.68 KB) Additional Information: full citation , abstract, references , index terms 

This research's primary contribution is providing a relatively straightforward sustainable 
infrastructure for integrating documents and services. Users see a totally integrated 
environment. The integration infrastructure generates supplemental link anchors. Selecting 
one generates a list of relevant links automatically through the use of relationship rules. 



Keywords: automatic link generation, metainformation, relationship rules, service 
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17 Designing wrapper components for e-services in integrating heterogeneous systems 
Massimo Mecella, Barbara Pernici 

August 2001 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 10 Issue 1 

Full text available: ^ pdf(292.68 KB) Additional Information: full citation , abstract , citings, index terms 

Component-based approaches are becoming more and more popular to support Internet- 
based application development. Different component modeling approaches, however, can 
be adopted, obtaining different abstraction levels (either conceptual or operational). In this 
paper we present a component-based architecture for the design of e-applications, and 
discuss the concept of wrapper components as building blocks for the development of e- 
services, where these services are based on legacy systems. We dis ... 

Keywords: Component, Cooperation, Integration, Legacy system, Wrapper, e-application, 
e-service 
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Bridget Spitznagel, David Garlan 

May 2003 Proceedings of the 25th International Conference on Software Engineering 

Full text available:^ SI 
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Increasingly systems are composed of parts: software components, and the interaction 
mechanisms (connectors) that enable them to communicate. When assembling systems 
from independently developed and potentially mismatched parts, wrappers may be used to 
overcome mismatch as well as to remedy extra-functional defciencies. Unfortunately the 
current practice of wrapper creation and use is ad hoc, resulting in artifacts that are often 
hard to reuse or compose, and whose impact is diffcult to analyze. ... 
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Eric Wohlstadter, Stoney Jackson, Premkumar Devanbu 

July 2001 Proceedings of the 23rd International Conference on Software Engineering 
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Software developers writing new software have strong incentives to make their products 
compliant to standards such as CORBA, COM, and Java Beans. Standards-compliance 
facilitates inter-operability, component-based software assembly, and software reuse, thus 
leading to improved quality and productivity. Legacy software, on the other hand, is usually 
monolithic, and hard to maintain and adapt. Many organizations, saddled with entrenched 
legacy software, are confronted with the need to ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Full text available- fl| pdf£301 52 KB). Additional Information: MLcjtation, abstract, references, citings, index 
' terms 

In this paper we describe how we combined SDLIP and STARTS, two comple mentary 
protocols for searching over distributed document collections. The resulting protocol, which 
we call SDARTS, is simple yet expressible enough to enable building sophisticated 
metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch- 
specific elements from STARTS. We also report on our experience building three SDARTS- 
compliant wrappers: for locally available plain-text document collect ... 
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One approach to overcoming heterogeneity as a part of data integration in mediator 
systems is the use of metadata in the form of a vocabulary or ontology to represent domain 
knowledge explicitly. This requires including this meta level during query formulation and 
processing. In this paper, we address this problem in the context of a mediator that uses a 
concept-based integration model and an extension of the XQuery language called CQuery. 
This mediator has been developed as part of a project fo ... 

Keywords: Data integration, Mediator systems, Query processing 
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Multimedia technology, global information infrastructures and other developments allow 
users to access more and more information sources of various types. However, the 
"technical" availability alone (by means of networks, WWW, mail systems, databases, etc.) 
is not sufficient for making meaningful and advanced use of all information available on- 
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line. Therefore, the problem of effectively and efficiently accessing and querying 
heterogeneous and distributed data sources is an impo ... 
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Searching over heterogeneous information sources is difficult in part because of the 
nonuniform query languages. Our approach is to allow users to compose Boolean queries in 
one rich front-end language. For each user query and target source, we transform the user 
query into a subsuming query that can be supported by the source but that may return 
extra documents. The results are then processed by a filter query to yield the correct final 
results. In this article we introduce the architectur ... 

Keywords: Boolean queries, content-based retrieval, filtering, predicate rewriting, query 
subsumption, query translation 
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In healthcare settings, patients need access to online information tha t can help them 
understand their medical situation. Physicians need information that is clinically relevant to 
an individual patient. In this paper, we present our progress on developing a system, 
PERSIVAL, that is designed to provide personalized access to a distributed patient care 
digital library. Using the secure, online patient records at New York Presbyterian Hospital as 
a user model, PERSIVAL's components tailor s ... 

Keywords: medical digital library, multimedia, natural language, personalization, query 
interface, search, summarization 
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We propose a mediator model for providing integrated and unified access to multiple 
taxonomy-based sources. Each source comprises a taxonomy and a database that indexes 
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objects under the terms of the taxonomy. A mediator comprises a taxonomy and a set of 
relations between the mediator's and the sources' terms, called articulations. By combining 
different modes of query evaluation at the sources and the mediator and different types of 
query translation, a flexible, efficient scheme ... 

Keywords: Approximate query translation, Information integration, Mediators, Taxonomies 
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SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines 
two complementary existing protocols, SDLIP and STARTS, to define a uniform interface 
that collections should support for searching and exporting metasearch-related metadata. 
SDARTS also includes a toolkit with wrappers that are easily customized to make both local 
and remote document collections SDARTS-compliant. This paper describes two significant 
ways in which we have extended the SDARTS toolkit. First, we ... 

Keywords: SDLIP, distributed searching, metadata, metasearching, web databases, 
wrapper construction 
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The contents of many valuable Web-accessible databases are only available through search 
interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web 
sites have started to manually organize Web-accessible databases into YahooHike 
hierarchical classification schemes. Here we introduce QProber, a modular system that 
automates this classification process by using a small number of query probes, generated 
by document classifiers. QProber can use a variety of types of ... 

Keywords: Database classification, Web databases, hidden Web 
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When a query is submitted to a search engine, the search engine returns a dynamically 
generated result page containing the result records, each of which usually consists of a link 
to and/or snippet of a retrieved Web page. In addition, such a result page often also 
contains information irrelevant to the query, such as information related to the hosting site 
of the search engine and advertisements. In this paper, we present a technique for 
automatically producing wrappers that can be used to extr ... 

Keywords: information extraction, search engine, wrapper generation 
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While recent work has focused on providing tools and infrastructure for users to access 
electronic information over the Internet, the relationship between the physical world and 
information available online has been relatively unexplored. Information about a user's 
location, and the objects she interacts with, can be sufficient to recognize enough of the 
user's task to drive retrieval of online information relevant to the task at hand. The XLibris 
system automatically retrieves, aggregates ... 

Keywords: automated retrieval, information aggregation, metasearch, ubiquitous 
computing 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Distributed data processing is becoming a reality. Businesses want to do it for many 
reasons, and they often must do it in order to stay competitive. While much of the 
infrastructure for distributed data processing is already there (e.g., modern network 
technology), a number of issues make distributed data processing still a complex 
undertaking: (1) distributed systems can become very large, involving thousands of 
heterogeneous sites including PCs and mainframe server machines; (2) the stat ... 

Keywords: caching, client-server databases, database application systems, dissemination- 
based information systems, economic models for query processing, middleware, multitier 
architectures, query execution, query optimization, replication, wrappers 
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Research in process-centered environments (PCEs) has focused on project management 
support and has neglected method guidance for the engineers performing the (software) 
engineering process. It has been dominated by the search for suitable process-modeling 
languages and enactment mechanisms. The consequences of process orientation on the 
computer-based engineering environments, i.e., the interactive tools used during process 
performance, have been studied much less. In this article, we prese ... 

Keywords: PRIME, method guidance, process modeling, process-centered environments, 
process-integrated environments, process-sensitive tools, tool integration, tool modeling 
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Many approaches to reducing the cost, complexity and development time of software 
applications have been explored. Several techniques, most recently the Object Oriented 
paradigm, have been developed to facilitate the reuse of code. Even when these techniques 
employ the notion of components, they do so only in a limited way within the language's 
environment at development time. Within the field of Software Engineering, the study of 
Software Architecture has emerged to provide language support for ... 

Keywords: XML, architecture definition language, component, software architecture, 
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In meta-searchers accessing distributed Web-based information repositories, performance 
is a major issue. Efficient query processing requires an appropriate caching mechanism. 
Unfortunately, standard page-based as well as tuple-based caching mechanisms designed 
for conventional databases are not efficient on the Web, where keyword-based querying is 
often the only way to retrieve data. In this work, we study the problem of semantic caching 
of Web queries and develop a caching mechanism for conjun ... 

Keywords: Experiments, Query algorithms, Region containment, Semantic caching, 
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Component Based Developed aims at constructing software through the inter-relationship 
between pre-existing components. However, these components should be bound to a 
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specific application domain in order to be effectively reused. Reusable domain components 
and Their related documentation are usually stored in a great variety of data sources. Thus, 
a possible solution for accessing this information is to use a software layer that integrates 
different component information sources. We presen ... 

Keywords: component based engineering, component repositories, domain engineering, 
software classification and identification 
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We present the design of ObjectGiobe, a distributed and open query processor for Internet 
data sources. Today, data is published on the Internet via Web servers which have, if at all, 
very localized query processing capabilities. The goal of the ObjectGiobe project is to 
establish ah open marketplace in which data and query processing capabilities can be 
distributed and used by any kind of Internet application. Furthermore, ObjectGiobe 
integrates cycle providers (i.e., machi ... 

Keywords: Cycle-, function- and data provider, Distributed query processing, Open 
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A broad spectrum of electronic commerce applications is currently available on the Web, 
providing services in almost any area one can think of. As the number and variety of such 
applications grow, more business opportunities emerge for providing new services based on 
the integration and customization of existing applications. (Web shopping malls and support 
for comparative shopping are just a couple of examples.) Unfortunately, the diversity of 
applications in each specific domain and the dispar ... 

Keywords: Application integration, Data integration, Electronic commerce 
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Business-to-Business (B2B) technologies pre-date the Web. They have existed for at least 
as long as the Internet. B2B applications were among the first to take advantage of 
advances in computer networking. The Electronic Data Interchange (EDI) business standard 
is an illustration of such an early adoption of the advances in computer networking. The 
ubiquity and the affordability of the Web has made it possible for the masses of businesses 
to automate their B2B interactions. However, several issu ... 
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^ ABSTRACT 

In order to access information from a variety of heterogeneous information sources, one has to be 
able to translate queries and data from one data model into another. This functionality is provided by 
so-called (source) wrappers [4,8] which convert queries into one or more commands/queries 
understandable by the underlying source and transform the native results into a format understood 
by the application. As part of the TSIMMIS project [1, 6] we have developed hard-coded wrappers for 
a variety of sources (e.g., Sybase DBMS, WWW pages, etc.) including legacy systems (Folio). 
However, anyone who has built a wrapper before can attest that a lot of effort goes into developing 
and writing such a wrapper. In situations where it is important or desirable to gain access to new 
sources quickly, this is a major drawback. Furthermore, we have also observed that only a relatively 
small part of the code deals with the specific access details of the source. The rest of the code is 
either common among wrappers or implements query and data transformation that could be 
expressed in a high level, declarative fashion. Based on these observations, we have developed a 
wrapper implementation toolkit [7] for quickly building wrappers. The toolkit contains a library for 
commonly used functions, such as for receiving queries from the application and packaging results. It 
also contains a facility for translating queries into source-specific commands, and for translating 
results into a model useful to the application. The philosophy behind our "template-based" translation 
methodology is as follows. The wrapper implementor specifies a set of templates (rules) written in a 
high level declarative language that describe the queries accepted by the wrapper as well as the 
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objects that it returns. If an application query matches a template, an implementor-provided action 
associated with the template is executed to provide the native query for the underlying sourcel. 
When the source returns the result of the query, the wrapper transforms the answer which is 
represented in the data model of the source into a representation that is used by the application. 
Using this toolkit one can quickly design a simple wrapper with a few templates that cover some of 
the desired functionality, probably the one that is most urgently needed. However, templates can be 
added gradually as more functionality is required later on. Another important use of wrappers is in 
extending the query capabilities of a source. For instance, some sources may not be capable of 
answering queries that have multiple predicates. In such cases, it is necessary to pose a native query 
to such a source using only predicates that the source is capable of handling. The rest of the 
predicates are automatically separated from the user query and form a filter query. When the 
wrapper receives the results, a post-processing engine applies the filter query. This engine supports 
a set of built-in predicates based on the comparison operators =,*,<,>, etc. In addition, the engine 

supports more complex predicates that can be specified as part of the filter query. The 
postprocessing engine is common to wrappers of all sources and is part of the wrapper toolkit. Note 
that because of postprocessing, the wrapper can handle a much larger class of queries than those 
that exactly match the templates it has been given. Figure 1 shows an overview of the wrapper 
architecture as it is currently implemented in our TSIMMIS testbed. Shaded components are provided 
by the toolkit, the white component is source-specific and must be generated by the implementor. 
The driver component controls the translation process and invokes the following services: the parser 
which parses the templates, the native schema, as well as the incoming queries into internal data 
structures, the matcher which matches a query against the set of templates and creates a filter 
query for postprocessing if necessary, the native component which submits the generated action 
string to the source, and extracts the data from the native result using the information given in the 
source schema, and the engine, which transforms and packages the result and applies a 
postprocessing filter if one has been created by the matcher. We now describe the sequence of 
events that occur at the wrapper during the translation of a query and its result using an example 
from our prototype system. The queries are formulated using a rule-based language called MSL that 
has been developed as a template specification and query language for the TSIMMIS project. Data is 
represented using our Object Exchange Model (OEM). We will briefly describe MSL and OEM in the 
next section. Details on MSL can be found in [5], a full introduction to OEM is given in [1], 
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